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We study electronic transport in long DNA chains using the tight-binding approach for a ladder-like model of 
DNA. We find insulating behavior with localizaton lengths ^ ~ 25 in units of average base-pair seperation. 
Furthermore, we observe small, but significant differences between A-DNA, centromeric DNA, promoter 
sequences as well as random- ATGC DNA. 



1 Introduction DNA is a macro-molecule consisting of repeated stacks of bases formed by either AT 
(TA) or GC (CG) pairs coupled via hydrogen bonds and held in the double-helix structure by a sugar- 
phosphate backbone. In most models of electronic transport [1,2] it has been assumed — following earlier 
pioneering work [3,4] — that the transmission channels are along the long axis of the DNA molecule and 
that the conduction path is due to 7r-orbital overlap between consecutive bases [5]. 

A simple quasi- ID model incorporating these aspects has been recently introduced [6], building on an 
earUer, even simpler ID model [1]. For the model, electronic transport properties have been investigated in 
terms of localisation lengths [6,7], crudely speaking the length over which electrons travel. Various types 
of disorder, including random potentials, had been employed to account for different real environments. It 
was found that random and A-DNA have locahsation lengths allowing for electron motion among a few 
dozen base pairs only. However, poly(dG)-poly(dC) and also telomeric-DNA have much larger electron 
localization lengths. In Ref. [6], a novel enhancement of locahsation lengths has been observed at particular 
energies for an increasing binary backbone disorder. 

2 Tlie DNA tight-binding model A convenient tight binding model for DNA can be constructed as 
follows: it has two central conduction channels in which individual sites represent an individual base; 
these are interconnected and further linked to upper and lower sites, representing the backbone, but are not 
interconnected along the backbone. Every link between sites implies the presence of a hopping amplitude. 
The Hamiltonian for this ladder-like model is given by 



Hl = {U,r\i,T){i + l,T\+e,^r\i,T){l,T\)+ ^ {tfli , t) {l , q{T)\ + sflt , q) {l, q\) 
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where is the hopping ampUtude between sites along each branch t — 1,2 and Si_r is the corresponding 
onsite potential energy, tf and and ej give hopping amplitudes and onsite energies at the backbone sites. 
Also, (/(r) =1, 1 for t — 1,2, respectively. The parameter ti2 represents the hopping between the two 
central branches, i.e., perpendicular to the direction of conduction. Quantum chemical calculations with 
semi-empirical wave function bases using the SPARTAN package [8] results suggest that this value, dom- 
inated by the wave function overlap across the hydrogen bonds, is weak and so we choose ti2 — 1/10. As 
we restrict our attention here to pure DNA, we also set e^^ = for all i and r. 

The model Q clearly represents a dramatic simplification of DNA. Nevertheless, in Ref. [1] it had been 
shown that an even simpler model — in which base-pairs are combined into a single site — when applied 
to an artificial sequence of repeated GC base pairs, poly(dG)-poly(dC) DNA, reproduces experimental 
data current-voltage measurements when ti = 0.37eV and = 0.74eV are being used. This motivates the 
above parametrization of = 2ti and ti^r = 1 for hopping between like (GC/GC, AT/ AT) pairs. Assuming 
that the wave function overlap between consecutive bases along the DNA strand is weaker between unlike 
and non-matching bases (AT/GC, TA/GC, etc.) we thus choose 1/2. Furthermore, since the energetic 
differences in the adiabatic electron affinities of the bases are small [9], we choose Si = for all i. Due to 
the non-connectedness of the backbone sites along the DNA strands, the model Q can be further simplified 
to yield a model in which the backbone sites are incorporated into the electronic structure of the DNA. The 
effective ladder model reads as 

L 

=^il,2N,l)(*,2|+ t,^r\i,T){i+l,T\ + 

i=l T=l,2 

Thus the backbone has been incorporated into an energy -dependent onsite potential on the main DNA sites. 
This effect is at the heart of the enhancement of localization lengths due to increasing binary backbone 
disorder reported previously [6]. 

3 A-DNA, centromers and promoters We shall use 2 naturally occurring DNA sequences ("strings"), 
(i) A-DNA [28] is DNA from the bacteriophage virus. It has a sequence of 48502 base pairs and is biologi- 
cally very well characterised. Its ratio a of like to un-like base-pairs is ax = 0.949. (ii) centromeric DNA 
for chromosome 2 of yeast has 813138 base pairs [29] and accntro. = 0.955. This DNA is also rich in AT 
bases and has a high rate of repetitions which should be favourable for electronic transport. 

Another class of naturally existing DNA strands is provided by so-called promoter sequences. We 
use a collection of 4986 is these which have been assembled from the TRANSFAC database and cover a 
range of organisms such as mouse, human, fly, and various viruses. Promoter sequences are biologically 
very interesting because they represent those places along a DNA string where polymerase enzymes bind 
and start the copying process that eventually leads to synthesis of proteins. On average, these promoters 
consist of approximetely 17 base-pairs, much too short for a valid localization length analysis by TMM. 
Therefore, we concatenate them into a 86827 base-pair long super-promoter with asupor-p. = 0.921. 
In order to obtain representative results, 100 such super-promoters have been constructed, representing 
different random arrangements of the promoters, and the results presented later will be averages'. 

Occasionally, we show results for "scrambled" DNA. This is DNA with the same number of A, T, C, G 
bases, but with their order randomised. Clearly, such sequences contain the same set of electronic potentials 
and hopping variations, but would perform quite differently in a biological context. A comparison of their 
transport properties with those from the original sequence thus allows to measure how important the exact 
fidelity of a sequence is. On average, we find for these sequences ax/s = 0.899, Qfccntio /s = 0.9951 and 

"super-p./S = 0.901. 



Averages of § are computed by averaging 1/^. 
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Fig. 1 Localization lengths ^ versus Fermi energy E for various clean DNA (solid symbols), scrambled DNA 
(DNA/S, (open o, □, o) and promoted DNA (DNA/R, open A) strands. Only every 10th (20th) symbol is shown 
for clean (scrambled/promoted) DNA. Error bars reflect the standard deviation after sampling the different sequences 
for random- ATGC, scrambled and promoted DNA. 

A convenient choice of artificial DNA strand is a simple, 100000 base-pair long random sequence of 
the four bases, random- ATGC DNA, which we construct with equal probability for all 4 bases (arandom = 
0.901). We shall also 'promote' these random DNA strings by inserting all 4086 promoter sequences at 
random positions in the random-ATGC DNA (aiandom/p — 0.910). 

4 Results for localization lengths For studying the transport properties of model Q, we use a variant 
of the iterative transfer-matrix method (TMM) [10-14]. The TMM allows us to determine the localisation 
length f of electronic states in the present system with fixed cross sections M — 2 (ladder) and length 
L M. Traditionally, a few million sites are needed for L to achieve reasonable accuracy for ^. However, 
in the present situation we are interested in finding ^ also for much shorter DNA strands of typically only 
a few ten thousand base-pair long sequences. Thus in order to restore the required precision, we have 
modified the conventional TMM and can now perform TMM on a system of fixed length Lq by repeating 
forward- and backward-TMM steps [6, 15-17]. 

We have computed the energy dependence of the localization lengths for all sequences of section |3l 
In addition, A-DNA, centromeric DNA and the super-promoter DNA where also scrambled 100 times 
and the localization length of each resulting sequence measured and the appropriate average constructed. 
Also, we constructed 100 promoted random-ATGC DNA sequences. As shown previously [6], the energy 
dependence of ^ reflects the backbone-induced two-band structure. The obtained £,{E) values for the 
lower band are shown in Fig. In the absence of any onsite-disorder, we find two prominent peaks 
separated by ii.2 and £,{E) = £'). We also see that A-DNA has roughly the same £,{E) dependence 
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as random-ATGC-DNA. Promoting a given DNA sequence leads to small increases in localization length 
whereas scrambling can lead to increase as well as decrease. The super-promoter has larger ^ values 
compared to random-atcg- and A-DNA. Most surprisingly, centromeric DNA — the longest investigated 
DNA sequence — has a much larger localization length than all other DNA sequences and this even 
increases after scrambling. 

5 Conclusions We have shown that the ladder model Q is a simple, yet non-trivial representation of 
DNA within the tight-binding formalism. While keeping the number of parameters small, we manage 
to reproduce the wide-gap structure observed in much more accurate quantum chemical calculations of 
short DNA strands [1, 18-21]. In order to study the transport properties, we employ a variant of the TMM 
which provides useful information about the spatial extend ^ of electronic states along a DNA strand in the 
quantum regime at T = 0. We note that the values of ^ which we find are around 25 in the band which is 
surprisingly close to studies of range dependence of electron transfer [5,22-27]. 

From our results, we find clear differences in localization lengths which are not simple related to a 
difference in DNA composition, but also reflect the order of base-pairs. Still the differences are within 
10 — 20% and it remains unclear how relevant these findings are biologically, i.e., whether electronic 
transport plays a role in the biological mechanism of DNA repair and protein generation. 

Acknowledgements This work has been supported in part by the Royal Society. We thank A. Croy and C. 
Sohrmann for useful discussions. 

References 

[1] G. Cuniberti, L. Craco, D. Porath, and C. Dekker, Phys. Rev. B 65, 241314 (2002). 

[2] J. Zhong, in Proceedings of tlie 2003 Nanoteclinology Conference, Computational Publications, edited by M. 
Laudon and B. Romamowicz (PUBLISHER, ADDRESS, 2003), Vol. 2, pp. 105-108, (Molecular and Nano 
Electronics). 

[3] I. Ladik, M. Seel, R Otto, and A. Bakhshi, Chem. Phys. 108, 203 (1986). 

[4] A. Bakhshi, R Otto, L. I., and M. Seel, Chem. Phys. 108, 215 (1986). 

[5] C. R. Treadway, M. G. Hill, and I. K. Barton, Chemical Physics 281, 409 (2002). 

[6] D. K. Klotsa, R. A. Romer, and M. S. Turner, Biophys. I. 89, (2005). 

[7] H. Yamada, Phys. Lett. A (2004). 

[8] SPARTAN version 5.0, User's Guide, Wavefunction Inc., 18401 Von Karman Ave., Suite 370 Irvine, CA 92612. 

[9] S. S. Wesolowski, M. L. Leininger, R N. Pentchev, and H. R Schaefer III, J. Am. Chem. Soc. 123,, 4023 (2001). 

[10] I.-L. Pichard and G. Sarma, J. Phys. C 14, L127 (1981). 

[11] I.-L. Pichard and G. Sarma, J. Phys. C 14, L617 (1981). 

[12] A. MacKinnon and B. Kramer, Z. Phys. B 53, 1 (1983). 

[13] B. Kramer and A. MacKinnon, Rep. Prog. Phys. 56, 1469 (1993). 

[14] A. MacKinnon, I. Phys.: Condens. Matter 6, 2511 (1994). 

[15] K. Frahm, A. MuUer-Groeling, J. L. Pichard, and D. Weinmann, Europhys. Lett. 31, 169 (1995). 

[16] R. A. Romer and M. Schreiber, Phys. Rev. Lett. 78, 4890 (1997). 

[17] M. L. Ndawana, R. A. Romer, and M. Schreiber, Europhys. Lett. 68, 678 (2004). 

[18] O. R. Davies and I. E. Inglesfield, Phys. Rev. B 69, 195110 (2004). 

[19] V. Bhalla, R. P. Bajpai, and L. M. Bharadwaj, European Molecular Biology reports 4, 442 (2003). 

[20] I. L. Garzon et al, Nanotechnology 12, 126 (2001). 

[21] A. Rakitin et al., Phys. Rev. Lett. 86, 3670 (2001). 

[22] C. Wan et al, Proc. Nad. Acad. Sci. 97, 14052 (2000). 

[23] E. Boon et al, Proc. Nat. Acad. Sci. 100, 12543 (2003). 

[24] S. O. Kelley and I. K. Barton, Science 283, 375 (1999). 

[25] C. J. Murphy et al. Science 262, 1025 (1993). 

[26] M. A. O'Neil, C. Dohno, and I. K. Barton, loumal of the American Chemical Society Communications 126, 
1316 (2004). 

[27] S. Delaney and I. K. Barton, J. Org. Chem. 68, 6475 (2003). 

[28] Bacteriophage lambda, complete genome [gi | 962 6243 | ref | NC_0 014 1 6 . 1 I [9626243]], Genbank 

Accession number NC_00 14 1 6, http : / / www .ncbi.nlm.nih.gov/entrez/ 

[29] CEN2, Chromosome II centromere, http : / /www . yeastgenome .org/ 



© 2003 WILEY- VCH Verlag GmbH & Co. KGaA, Weinheim 



Greek symbols — w-greek.sty 



in 
o 
o 

(N 

< 

O 

m 



o 



o 



o 

oo 
O 

o 

a 

o 
o 





a \alpha 




6 \theta 








T 


\tau 




P \beta 




"!? \vartheta 


TT 


\pi 


V 


\upsilon 




7 Xgamma 




/ \ "i n"t" PI 


ZO 


Xvarpi 


(h 






5 \delta 




K \kappa 


o 
r 


\rho 


LD 


\varphi 




e \epsilon 




A \ lambda 


Q 


\varrho 


X 


\chi 




e \varepsilon 


/i \mu 


a 


\ Sigma 




\psi 




C \zeta 




V \nu 


<^ 


\var Sigma 




\omega 




r] \eta 




e \xi 












r XitGamma 




A \itLambda 


E 


\itSigma 




\itPsi 




\itDelta 




S \itXi 


T 


\itUpsilon 


Q 


\itOmega 




\itTheta 




il \itPi 


<P 


\itPhi 












Table 1: Slanted greek letters 






a 


\upalpha 





\uptheta 





\upo 


X 


\uptau 


P 


\upbeta 




\upvartheta 


TZ 


\uppi 


X) 


\upupsilon 


Y 
1 


\upgamma 


I 


\upiota 


m 


\upvarpi 


(b 

Y 


\ Li. lu* 1-' J-l -L 


5 


\updelta 


K 


\upkappa 


P 


\uprho 


(D 


\uDvarDhi 


o 
c 


\upepsilon 


X 


\uplambda 


Q 


\varrho 


X 


\upciii 


e 


\varepsilon 




\upmu 


o 


\upsigma 


¥ 


\uppsi 




\upzeta 


V 


\upnu 


<; 


\upvar Sigma 


(0 


\upomega 




\upeta 




\upxi 










r 


\Gamma 


A 


\Lambda 


E 


\Sigma 




\Psi 


A 


\Delta 




\Xi 


T 


\Upsilon 




\Omega 





\Theta 


n 


\Pi 


$ 


\Phi 







Table 2: Upright greek letters 
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Table 3: Boldface variants of slanted greek letters 
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Table 4: Boldface variants of upright greek letters 
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